unseen attack
- North America > United States > Virginia (0.05)
- North America > United States > Maryland (0.05)
- Atlantic Ocean > North Atlantic Ocean > Chesapeake Bay (0.05)
SupplementaryMaterial: DualManifoldAdversarialRobustness: Defense againstLpandnon-LpAdversarialAttacks AOM-ImageNetDetails
As pre-processing, each image was center-cropped to produce a square image, and convertedto256 256resolution. In Figure 1, we presentxi (Original) andg(wi)(Projected). Figure 1: Visual comparison between original images and projected images. Weuse the SGD optimizer with the cyclic learning rate scheduling strategyin[10](see Figure 2), momentum0.9,andweightdecay5 For the unseen attacks proposed in [11], we consider attack parameters presented in Table 3. We study how different choices affect the robustness of the trained networks against unseen attacks.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- Asia > Middle East > Jordan (0.05)
CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense
Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39\% (+4.01\%) on CIFAR-10, 56.25\% (+3.13\%) on CIFAR-100, and 82.62\% (+4.93\%) on GTSRB (German Traffic Sign Recognition Benchmark).
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Dabas, Mahavir, Huynh, Tran, Billa, Nikhil Reddy, Wang, Jiachen T., Gao, Peng, Peris, Charith, Ma, Yao, Gupta, Rahul, Jin, Ming, Mittal, Prateek, Jia, Ruoxi
Large language models remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Defending against novel jailbreaks represents a critical challenge in AI safety. Adversarial training -- designed to make models robust against worst-case perturbations -- has been the dominant paradigm for adversarial robustness. However, due to optimization challenges and difficulties in defining realistic threat models, adversarial training methods often fail on newly developed jailbreaks in practice. This paper proposes a new paradigm for improving robustness against unseen jailbreaks, centered on the Adversarial Déjà Vu hypothesis: novel jailbreaks are not fundamentally new, but largely recombinations of adversarial skills from previous attacks. We study this hypothesis through a large-scale analysis of 32 attack papers published over two years. Using an automated pipeline, we extract and compress adversarial skills into a sparse dictionary of primitives, with LLMs generating human-readable descriptions. Our analysis reveals that unseen attacks can be effectively explained as sparse compositions of earlier skills, with explanatory power increasing monotonically as skill coverage grows. Guided by this insight, we introduce Adversarial Skill Compositional Training (ASCoT), which trains on diverse compositions of skill primitives rather than isolated attack instances. ASCoT substantially improves robustness to unseen attacks, including multi-turn jailbreaks, while maintaining low over-refusal rates. We also demonstrate that expanding adversarial skill coverage, not just data scale, is key to defending against novel attacks. \textcolor{red}{\textbf{Warning: This paper contains content that may be harmful or offensive in nature.
- North America > United States > Virginia (0.04)
- Africa > Kenya (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Dual Manifold Adversarial Robustness: Defense against L p and non-L p Adversarial Attacks A OM-ImageNet Details A.1 Overview
Figure 1: Visual comparison between original images and projected images. All the classification models are trained using two P6000 GPUs with a batch size of 64 for 20 epochs. We study how different choices affect the robustness of the trained networks against unseen attacks. Table 4: Classification accuracy against unseen attacks applied to OM-ImageNet test set. Table 5. 3 Table 5: Classification accuracy against known (PGD-50 and OM-PGD-50) and unseen attacks Brighter colors indicate larger absolute differences.
- North America > Canada (0.05)
- Asia > Middle East > Jordan (0.05)
- Information Technology > Security & Privacy (0.51)
- Government > Military (0.41)
In this work, we consider the scenario when the 1 manifold information is exact and show that this information can be very useful for improving robustness to novel
How DMA T can be exploited for standard tasks/datasets? PGD should not be viewed as the strongest attack for evaluation. Results are shown in Table B. Results are presented in the last column of Table B. DMA T Other strong baselines such as TRADES should be included in the main paper . The notion of "manifold" should be clarified. We will explain this further in the paper.
Robustness Feature Adapter for Efficient Adversarial Training
Wu, Quanwei, Guo, Jun, Wang, Wei, Wang, Yi
Adversarial training (AT) with projected gradient descent is the most popular method to improve model robustness under adversarial attacks. However, computational overheads become prohibitively large when AT is applied to large backbone models. AT is also known to have the issue of robust overfitting. This paper contributes to solving both problems simultaneously towards building more trustworthy foundation models. In particular, we propose a new adapter-based approach for efficient AT directly in the feature space. We show that the proposed adapter-based approach can improve the inner-loop convergence quality by eliminating robust overfitting. As a result, it significantly increases computational efficiency and improves model accuracy by generalizing adversarial robustness to unseen attacks. We demonstrate the effectiveness of the new adapter-based approach in different backbone architectures and in AT at scale.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hong Kong (0.04)
CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense
Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39\% ( 4.01\%) on CIFAR-10, 56.25\% ( 3.13\%) on CIFAR-100, and 82.62\% ( 4.93\%) on GTSRB (German Traffic Sign Recognition Benchmark).
Unseen Attack Detection in Software-Defined Networking Using a BERT-Based Large Language Model
Swileh, Mohammed N., Zhang, Shengli
Software defined networking (SDN) represents a transformative shift in network architecture by decoupling the control plane from the data plane, enabling centralized and flexible management of network resources. However, this architectural shift introduces significant security challenges, as SDN's centralized control becomes an attractive target for various types of attacks. While current research has yielded valuable insights into attack detection in SDN, critical gaps remain. Addressing challenges in feature selection, broadening the scope beyond DDoS attacks, strengthening attack decisions based on multi flow analysis, and building models capable of detecting unseen attacks that they have not been explicitly trained on are essential steps toward advancing security in SDN. In this paper, we introduce a novel approach that leverages Natural Language Processing (NLP) and the pre trained BERT base model to enhance attack detection in SDN. Our approach transforms network flow data into a format interpretable by language models, allowing BERT to capture intricate patterns and relationships within network traffic. By using Random Forest for feature selection, we optimize model performance and reduce computational overhead, ensuring accurate detection. Attack decisions are made based on several flows, providing stronger and more reliable detection of malicious traffic. Furthermore, our approach is specifically designed to detect previously unseen attacks, offering a solution for identifying threats that the model was not explicitly trained on. To rigorously evaluate our approach, we conducted experiments in two scenarios: one focused on detecting known attacks, achieving 99.96% accuracy, and another on detecting unseen attacks, where our model achieved 99.96% accuracy, demonstrating the robustness of our approach in detecting evolving threats to improve the security of SDN networks.
- Workflow (1.00)
- Research Report > Promising Solution (0.54)
- Overview > Innovation (0.34)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)